Search Results: "shell"

16 June 2023

John Goerzen: Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I ll make a comparison post. What is git-annex? git-annex is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. Its homepage says:
git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
I think the particularly interesting features of git-annex aren t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like: git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient! git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives. git-annex challenges In my prior post, I related some challenges with git-annex. The biggest of them quite poor performance of the directory special remote when dealing with many files has been resolved by Joey, git-annex s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release. git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I m not sure about 1,000,000-file repositories (I haven t tested); there is a page about scalability. A few other more minor challenges remain: I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save: mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec And, after restoration, the timestamps can be applied with: mtree -t -U -e < /tmp/spec Walkthrough: initial setup To use git-annex in this way, we have to do some setup. My general approach is this: Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.
$ REPONAME=testdata
$ mkdir "$METAREPO"
$ cd "$METAREPO"
$ git init
$ git config annex.thin true
There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that. Let's continue:
$ git annex init 'local hub'
init local hub ok
(recording state in git...)
$ git annex wanted . "include=* and exclude=$REPONAME/*"
wanted . ok
(recording state in git...)
In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here. Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.
$ touch mtree
$ git annex add mtree
add mtree
ok
(recording state in git...)
$ git annex sync
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
[main (root-commit) 6044742] git-annex in local hub
1 file changed, 1 insertion(+)
create mode 120000 mtree
ok
$ ls -l
total 9
lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...
OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:
git annex adjust --unlock-present
adjust
Switched to branch 'adjusted/main(unlockpresent)'
ok
$ ls -l
total 1
-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree
You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.
$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \
encryption=none
initremote source ok
(recording state in git...)
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
$ git config remote.source.annex-tracking-branch main:$REPONAME
OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.
$ git annex sync
At this point, you'll see git-annex computing a hash for every file in the source directory. I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB. Now we can see what git-annex thinks about file locations:
$ git-annex whereis less
whereis mtree (1 copy)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
ok
whereis testdata/[redacted] (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
... many more lines ...
So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them. Walkthrough: removable drives I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.
$ cd $DRIVE01
$ df -h .
Filesystem Size Used Avail Use% Mounted on
acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01
$ git clone $METAREPO
Cloning into 'testdata'...
done.
$ cd $REPONAME
$ git config annex.thin true
$ git annex init "test drive #1"
$ git annex adjust --hide-missing --unlock
adjust
Switched to branch 'adjusted/main(hidemissing-unlocked)'
ok
$ git annex sync
OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config remote.source.annex-tracking-branch main:$REPONAME
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
Now, we'll add the drive to a group called "driveset01" and configure what we want on it:
$ git annex group . driveset01
$ git annex wanted . '(not copies=driveset01:1)'
What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01. Now let's load up some files!
$ git annex sync --content
As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted! Now, we need to teach $METAREPO about $DRIVE01.
$ cd $METAREPO
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync drive01
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/main(unlockpresent)
nothing to commit, working tree clean
ok
merge synced/main (Merging into main...)
Updating d1d9e53..817befc
Fast-forward
(Merging into adjusted branch...)
Updating 7ccc20b..861aa60
Fast-forward
ok
pull drive01
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (110/110), 13.01 KiB 1.44 MiB/s, done.
Resolving deltas: 100% (6/6), completed with 6 local objects.
From /acrypt/no-backup/annexdrive01/testdata
* [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)
* [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent)
* [new branch] git-annex -> drive01/git-annex
* [new branch] main -> drive01/main
* [new branch] synced/main -> drive01/synced/main
ok
OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:
$ git annex whereis less
whereis mtree (2 copies)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
ok
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive! Walkthrough: Adding a second drive The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
whereis testdata/[redacted] (1 copy)
c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
Look at that! Some files on drive01, some on drive02, some neither place. Perfect! Walkthrough: Updates So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this? First, we update the metadata repo:
$ cd $METAREPO
$ git annex sync
$ git annex dropunused all
OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:
$ git annex whereis less
...
whereis testdata/cp (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
whereis testdata/file01-unchanged (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01. Well, let's update drive01.
$ cd $DRIVE01/$REPONAME
$ git annex sync --content
Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.
$ git annex get --auto
$ git annex drop --auto
$ git annex dropunused all
And now, let's make sure metarepo is updated with its state.
$ cd $METAREPO
$ git annex sync
We could do the same for drive02. This is how we would proceed with every update. Walkthrough: Restoration Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.
$ mv $METAREPO $METAREPO.disabled
$ mv $SOURCEDIR $SOURCEDIR.disabled
$ git clone $DRIVE01/$REPONAME $RESTOREDIR
$ cd $RESTOREDIR
$ git config annex.thin true
$ git annex init "restore"
$ git annex adjust --hide-missing --unlock
Now, we need to connect the drive01 and pull the files from it.
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync --content
Now, repeat with drive02:
$ git remote add drive02 $DRIVE02/$REPONAME
$ git annex sync --content
Now we've got all our content back! Here's what whereis looks like:
whereis testdata/file01-unchanged (3 copies)
3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]
9e48387e-b096-400a-8555-a3caf5b70a64 -- source
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]
ok
...
I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically. Conclusions I think I have demonstrated two things: First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users. Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why. These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

5 June 2023

Reproducible Builds: Reproducible Builds in May 2023

Welcome to the May 2023 report from the Reproducible Builds project In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.


Holger Levsen gave a talk at the 2023 edition of the Debian Reunion Hamburg, a semi-informal meetup of Debian-related people in northern Germany. The slides are available online.
In April, Holger Levsen gave a talk at foss-north 2023 titled Reproducible Builds, the first ten years. Last month, however, Holger s talk was covered in a round-up of the conference on the Free Software Foundation Europe (FSFE) blog.
Pronnoy Goswami, Saksham Gupta, Zhiyuan Li, Na Meng and Daphne Yao from Virginia Tech published a paper investigating the Reproducibility of NPM Packages. The abstract includes:
When using open-source NPM packages, most developers download prebuilt packages on npmjs.com instead of building those packages from available source, and implicitly trust the downloaded packages. However, it is unknown whether the blindly trusted prebuilt NPM packages are reproducible (i.e., whether there is always a verifiable path from source code to any published NPM package). [ ] We downloaded versions/releases of 226 most popularly used NPM packages and then built each version with the available source on GitHub. Next, we applied a differencing tool to compare the versions we built against versions downloaded from NPM, and further inspected any reported difference.
The paper reports that among the 3,390 versions of the 226 packages, only 2,087 versions are reproducible, and furthermore that multiple factors contribute to the non-reproducibility including flexible versioning information in package.json file and the divergent behaviors between distinct versions of tools used in the build process. The paper concludes with insights for future verifiable build procedures. Unfortunately, a PDF is not available publically yet, but a Digital Object Identifier (DOI) is available on the paper s IEEE page.
Elsewhere in academia, Betul Gokkaya, Leonardo Aniello and Basel Halak of the School of Electronics and Computer Science at the University of Southampton published a new paper containing a broad overview of attacks and comprehensive risk assessment for software supply chain security. Their paper, titled Software supply chain: review of attacks, risk assessment strategies and security controls, analyses the most common software supply-chain attacks by providing the latest trend of analyzed attack, and identifies the security risks for open-source and third-party software supply chains. Furthermore, their study introduces unique security controls to mitigate analyzed cyber-attacks and risks by linking them with real-life security incidence and attacks . (arXiv.org, PDF)
NixOS is now tracking two new reports at reproducible.nixos.org. Aside from the collection of build-time dependencies of the minimal and Gnome installation ISOs, this page now also contains reports that are restricted to the artifacts that make it into the image. The minimal ISO is currently reproducible except for Python 3.10, which hopefully will be resolved with the coming update to Python version 3.11.
On our rb-general mailing list this month: David A. Wheeler started a thread noting that the OSSGadget project s oss-reproducible tool was measuring something related to but not the same as reproducible builds. Initially they had adopted the term semantically reproducible build term for what it measured, which they defined as being if its build results can be either recreated exactly (a bit for bit reproducible build), or if the differences between the release package and a rebuilt package are not expected to produce functional differences in normal cases. This generated a significant number of replies, and several were concerned that people might confuse what they were measuring with reproducible builds . After discussion, the OSSGadget developers decided to switch to the term semantically equivalent for what they measured in order to reduce the risk of confusion. Vagrant Cascadian (vagrantc) posted an update about GCC, binutils, and Debian s build-essential set with some progress, some hope, and I daresay, some fears . Lastly, kpcyrd asked a question about building a reproducible Linux kernel package for Arch Linux (answered by Arnout Engelen). In the same, thread David A. Wheeler pointed out that the Linux Kernel documentation has a chapter about Reproducible kernel builds now as well.
In Debian this month, nine reviews of Debian packages were added, 20 were updated and 6 were removed this month, all adding to our knowledge about identified issues. In addition, Vagrant Cascadian added a link to the source code causing various ecbuild issues. [ ]
The F-Droid project updated its Inclusion How-To with a new section explaining why it considers reproducible builds to be best practice and hopes developers will support the team s efforts to make as many (new) apps reproducible as it reasonably can.
In diffoscope development this month, version 242 was uploaded to Debian unstable by Chris Lamb who also made the following changes: In addition, Mattia Rizzolo documented how to (re)-produce a binary blob in the code [ ] and Vagrant Cascadian updated the version of diffoscope in GNU Guix to 242 [ ].
reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Holger Levsen uploaded versions 0.7.24 and 0.7.25 to Debian unstable which added support for Tox versions 3 and 4 with help from Vagrant Cascadian [ ][ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Jason A. Donenfeld filed a bug (now fixed in the latest alpha version) in the Android issue tracker to report that generateLocaleConfig in Android Gradle Plugin version 8.1.0 generates XML files using non-deterministic ordering, breaking reproducible builds. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In May, a number of changes were made by Holger Levsen:
  • Update the kernel configuration of arm64 nodes only put required modules in the initrd to save space in the /boot partition. [ ]
  • A huge number of changes to a new tool to document/track Jenkins node maintenance, including adding --fetch, --help, --no-future and --verbose options [ ][ ][ ][ ] as well as adding a suite of new actions, such as apt-upgrade, command, deploy-git, rmstamp, etc. [ ][ ][ ][ ] in addition a significant amount of refactoring [ ][ ][ ][ ].
  • Issue warnings if apt has updates to install. [ ]
  • Allow Jenkins to run apt get update in maintenance job. [ ]
  • Installed bind9-dnsutils on some Ubuntu 18.04 nodes. [ ][ ]
  • Fixed the Jenkins shell monitor to correctly deal with little-used directories. [ ]
  • Updated the node health check to warn when apt upgrades are available. [ ]
  • Performed some node maintenance. [ ]
In addition, Vagrant Cascadian added the nocheck, nopgo and nolto when building gcc-* and binutils packages [ ] as well as performed some node maintenance [ ][ ]. In addition, Roland Clobus updated the openQA configuration to specify longer timeouts and access to the developer mode [ ] and updated the URL used for reproducible Debian Live images [ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

14 May 2023

C.J. Collier: Early Access: Inserting JSON data to BigQuery from Spark on Dataproc

Hello folks! We recently received a case letting us know that Dataproc 2.1.1 was unable to write to a BigQuery table with a column of type JSON. Although the BigQuery connector for Spark has had support for JSON columns since 0.28.0, the Dataproc images on the 2.1 line still cannot create tables with JSON columns or write to existing tables with JSON columns. The customer has graciously granted permission to share the code we developed to allow this operation. So if you are interested in working with JSON column tables on Dataproc 2.1 please continue reading! Use the following gcloud command to create your single-node dataproc cluster:
IMAGE_VERSION=2.1.1-debian11
REGION=us-west1
ZONE=$ REGION -a
CLUSTER_NAME=pick-a-cluster-name
gcloud dataproc clusters create $ CLUSTER_NAME  \
    --region $ REGION  \
    --zone $ ZONE  \
    --single-node \
    --master-machine-type n1-standard-4 \
    --master-boot-disk-type pd-ssd \
    --master-boot-disk-size 50 \
    --image-version $ IMAGE_VERSION  \
    --max-idle=90m \
    --enable-component-gateway \
    --scopes 'https://www.googleapis.com/auth/cloud-platform'
The following file is the Scala code used to write JSON structured data to a BigQuery table using Spark. The file following this one can be executed from your single-node Dataproc cluster. Main.scala
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types. Metadata, StringType, StructField, StructType 
import org.apache.spark.sql. Row, SaveMode, SparkSession 
import org.apache.spark.sql.avro
import org.apache.avro.specific
  val env = "x"
  val my_bucket = "cjac-docker-on-yarn"
  val my_table = "dataset.testavro2"
    val spark = env match  
      case "local" =>
        SparkSession
          .builder()
          .config("temporaryGcsBucket", my_bucket)
          .master("local")
          .appName("isssue_115574")
          .getOrCreate()
      case _ =>
        SparkSession
          .builder()
          .config("temporaryGcsBucket", my_bucket)
          .appName("isssue_115574")
          .getOrCreate()
     
  // create DF with some data
  val someData = Seq(
    Row(""" "name":"name1", "age": 10  """, "id1"),
    Row(""" "name":"name2", "age": 20  """, "id2")
  )
  val schema = StructType(
    Seq(
      StructField("user_age", StringType, true),
      StructField("id", StringType, true)
    )
  )
  val avroFileName = s"gs://$ my_bucket /issue_115574/someData.avro"
  
  val someDF = spark.createDataFrame(spark.sparkContext.parallelize(someData), schema)
  someDF.write.format("avro").mode("overwrite").save(avroFileName)
  val avroDF = spark.read.format("avro").load(avroFileName)
  // set metadata
  val dfJSON = avroDF
    .withColumn("user_age_no_metadata", col("user_age"))
    .withMetadata("user_age", Metadata.fromJson(""" "sqlType":"JSON" """))
  dfJSON.show()
  dfJSON.printSchema
  // write to BigQuery
  dfJSON.write.format("bigquery")
    .mode(SaveMode.Overwrite)
    .option("writeMethod", "indirect")
    .option("intermediateFormat", "avro")
    .option("useAvroLogicalTypes", "true")
    .option("table", my_table)
    .save()
repro.sh:
#!/bin/bash
PROJECT_ID=set-yours-here
DATASET_NAME=dataset
TABLE_NAME=testavro2
# We have to remove all of the existing spark bigquery jars from the local
# filesystem, as we will be using the symbols from the
# spark-3.3-bigquery-0.30.0.jar below.  Having existing jar files on the
# local filesystem will result in those symbols having higher precedence
# than the one loaded with the spark-shell.
sudo find /usr -name 'spark*bigquery*jar' -delete
# Remove the table from the bigquery dataset if it exists
bq rm -f -t $PROJECT_ID:$DATASET_NAME.$TABLE_NAME
# Create the table with a JSON type column
bq mk --table $PROJECT_ID:$DATASET_NAME.$TABLE_NAME \
  user_age:JSON,id:STRING,user_age_no_metadata:STRING
# Load the example Main.scala 
spark-shell -i Main.scala \
  --jars /usr/lib/spark/external/spark-avro.jar,gs://spark-lib/bigquery/spark-3.3-bigquery-0.30.0.jar
# Show the table schema when we use  bq mk --table  and then load the avro
bq query --use_legacy_sql=false \
  "SELECT ddl FROM $DATASET_NAME.INFORMATION_SCHEMA.TABLES where table_name='$TABLE_NAME'"
# Remove the table so that we can see that the table is created should it not exist
bq rm -f -t $PROJECT_ID:$DATASET_NAME.$TABLE_NAME
# Dynamically generate a DataFrame, store it to avro, load that avro,
# and write the avro to BigQuery, creating the table if it does not already exist
spark-shell -i Main.scala \
  --jars /usr/lib/spark/external/spark-avro.jar,gs://spark-lib/bigquery/spark-3.3-bigquery-0.30.0.jar
# Show that the table schema does not differ from one created with a bq mk --table
bq query --use_legacy_sql=false \
  "SELECT ddl FROM $DATASET_NAME.INFORMATION_SCHEMA.TABLES where table_name='$TABLE_NAME'"
Google BigQuery has supported JSON data since October of 2022, but until now, it has not been possible, on generally available Dataproc clusters, to interact with these columns using the Spark BigQuery Connector. JSON column type support was introduced in spark-bigquery-connector release 0.28.0.

6 May 2023

Reproducible Builds: Reproducible Builds in April 2023

Welcome to the April 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.

General news Trisquel is a fully-free operating system building on the work of Ubuntu Linux. This month, Simon Josefsson published an article on his blog titled Trisquel is 42% Reproducible!. Simon wrote:
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via apt-get update) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: Can I turn all of GHC into one module, and compile that? Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Court s wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start. There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable IRC channel.

Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository. This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the alembic Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on corporate open source strategy which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. [ ]

Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the #reproducible-builds IRC channel, but no solution appears to be in sight for now.

Community news On our mailing list this month: Holger Levsen gave a talk at foss-north 2023 in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years. Lastly, there were a number of updates to our website, including:
  • Chris Lamb attempted a number of ways to try and fix literal : .lead appearing in the page [ ][ ][ ], made all the Back to who is involved links italics [ ], and corrected the syntax of the _data/sponsors.yml file [ ].
  • Holger Levsen added his recent talk [ ], added Simon Josefsson, Mike Perry and Seth Schoen to the contributors page [ ][ ][ ], reworked the People page a little [ ] [ ], as well as fixed spelling of Arch Linux [ ].
Lastly, Mattia Rizzolo moved some old sponsors to a former section [ ] and Simon Josefsson added Trisquel GNU/Linux. [ ]

Debian
  • Vagrant Cascadian reported on the Debian s build-essential package set, which was inspired by how close we are to making the Debian build-essential set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]
  • Debian Developer Cyril Brulebois (kibi) filed a bug against snapshot.debian.org after they noticed that there are many missing dinstalls that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .
  • 20 reviews of Debian packages were added, 21 were updated and 5 were removed this month adding to our knowledge about identified issues. Chris Lamb added a new build_path_in_line_annotations_added_by_ruby_ragel toolchain issue. [ ]
  • Mattia Rizzolo announced that the data for the stretch archive on tests.reproducible-builds.org has been archived. This matches the archival of stretch within Debian itself. This is of some historical interest, as stretch was the first Debian release regularly tested by the Reproducible Builds project.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope development diffoscope version 241 was uploaded to Debian unstable by Chris Lamb. It included contributions already covered in previous months as well a change by Chris Lamb to add a missing raise statement that was accidentally dropped in a previous commit. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In April, a number of changes were made, including:
  • Holger Levsen:
    • Significant work on a new Documented Jenkins Maintenance (djm) script to support logged maintenance of nodes, etc. [ ][ ][ ][ ][ ][ ]
    • Add the new APT repo url for Jenkins itself with a new signing key. [ ][ ]
    • In the Jenkins shell monitor, allow 40 GiB of files for diffoscope for the Debian experimental distribution as Debian is frozen around the release at the moment. [ ]
    • Updated Arch Linux testing to cleanup leftover files left in /tmp/archlinux-ci/ after three days. [ ][ ][ ]
    • Mark a number of nodes hosted by Oregon State University Open Source Lab (OSUOSL) as online and offline. [ ][ ][ ]
    • Update the node health checks to detect failures to end schroot sessions. [ ]
    • Filter out another duplicate contributor from the contributor statistics. [ ]
  • Mattia Rizzolo:



If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

1 May 2023

Paul Wise: FLOSS Activities April 2023

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues
  • Security issue in secret manager (sent privately)
  • Broken symlinks in opencpn
  • Test problem in SPTAG
  • Debian migration unblock needed for evolution

Review

Administration
  • Debian IRC: fixed the #debian-mips channel topic
  • Debian wiki: unblock IP addresses, approve accounts
  • Debian QA services: deploy changes, investigate SourceForge uscan issue

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The SPTAG work was sponsored. All other work was done on a volunteer basis.

1 April 2023

Paul Wise: FLOSS Activities March 2023

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian QA services: disabled updating jessie as it was removed
  • Debian IRC: rescued #debian-s390x from inactive person
  • Debian servers: repair a /etc git repo
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The gensim, sptag, purple-discord, harmony work was sponsored. All other work was done on a volunteer basis.

26 March 2023

Emanuele Rocca: EFI and Secure Boot Notes

To create a bootable EFI drive to use with QEMU, first make a disk image and create a vfat filesystem on it.
$ dd if=/dev/zero of=boot.img bs=1M count=512
$ sudo mkfs.vfat boot.img
By default, EFI firmwares boot a specific file under /efi/boot/. The name of such file depends on the architecture: for example, on 64 bit x86 systems it is bootx64.efi, while on ARM it is bootaa64.efi.
Copy /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi from package grub-efi-amd64-bin to /efi/boot/bootx64.efi on the boot image, and that should be enough to start GRUB.
# mount boot.img /mnt/
# mkdir -p /mnt/efi/boot/
# cp /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi /mnt/efi/boot/bootx64.efi
# umount /mnt/
Now get the x86 firmware from package ovmf and start qemu:
$ cp /usr/share/OVMF/OVMF_CODE.fd /tmp/code.fd
$ qemu-system-x86_64 -drive file=/tmp/code.fd,format=raw,if=pflash -cdrom boot.img
GRUB looks fine, but it would be good to have a kernel to boot. Let s add one to boot.img.
# mount boot.img /mnt
# cp vmlinuz-6.1.0-7-amd64 /mnt/vmlinuz
# umount /mnt/
Boot with qemu again, but this time pass -m 1G. The default amount of memory is not enough to boot.
$ qemu-system-x86_64 -drive file=/tmp/code.fd,format=raw,if=pflash -cdrom boot.img -m 1G
At the grub prompt, type the following to boot:
grub> linux /vmlinuz
grub> boot
The kernel will start and reach the point of trying to mount the root fs. This is great but it would now be useful to have some sort of shell access in order to look around. Let s add an initrd!
# mount boot.img /mnt
# cp initrd.img-6.1.0-7-amd64 /mnt/initrd
# umount /mnt/
There s the option of starting qemu in console, let s try that out. Start qemu with -nographic, and append console=ttyS0 to the kernel command line arguments.
$ qemu-system-x86_64 -drive file=/tmp/code.fd,format=raw,if=pflash -cdrom boot.img -m 1G -nographic
grub> linux /vmlinuz console=ttyS0
grub> initrd /initrd
grub> boot
If all went well we are now in the initramfs shell. We can now run commands! At this point we can see that the system has Secure boot disabled:
(initramfs) dmesg   grep secureboot
[    0.000000] secureboot: Secure boot disabled
In order to boot with Secure boot, we need:
  • a signed shim, grub, and kernel
  • the right EFI variables for Secure boot
The package shim-signed provides a shim signed with Microsoft s key, while grub-efi-amd64-signed has GRUB signed with Debian s key.
The signatures can be shown with sbverify --list:
$ sbverify --list /usr/lib/shim/shimx64.efi.signed
warning: data remaining[823184 vs 948768]: gaps between PE/COFF sections?
signature 1
image signature issuers:
 - /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011
image signature certificates:
 - subject: /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Windows UEFI Driver Publisher
   issuer:  /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011
 - subject: /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation UEFI CA 2011
   issuer:  /C=US/ST=Washington/L=Redmond/O=Microsoft Corporation/CN=Microsoft Corporation Third Party Marketplace Root
Similarly for GRUB and the kernel:
$ sbverify --list /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
signature 1
image signature issuers:
 - /CN=Debian Secure Boot CA
image signature certificates:
 - subject: /CN=Debian Secure Boot Signer 2022 - grub2
   issuer:  /CN=Debian Secure Boot CA
$ sbverify --list /mnt/vmlinuz
signature 1
image signature issuers:
 - /CN=Debian Secure Boot CA
image signature certificates:
 - subject: /CN=Debian Secure Boot Signer 2022 - linux
   issuer:  /CN=Debian Secure Boot CA
Let s use the signed shim and grub in the boot image:
# mount boot.img /mnt
# cp /usr/lib/shim/shimx64.efi.signed /mnt/efi/boot/bootx64.efi
# cp /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed /mnt/efi/boot/grubx64.efi
# umount /mnt
And start QEMU with the appropriate EFI variables for Secure boot:
$ cp /usr/share/OVMF/OVMF_VARS.ms.fd /tmp/vars.fd
$ qemu-system-x86_64 -drive file=/tmp/code.fd,format=raw,if=pflash -drive file=/tmp/vars.fd,format=raw,if=pflash -cdrom boot.img -m 1G -nographic
We can double-check in the firmware settings if Secure boot is indeed enabled. At the GRUB prompt, type fwsetup:
grub> fwsetup
Check under "Device Manager" "Secure Boot Configuration" that "Attempt Secure Boot" is selected, then boot from GRUB as before. If all went well, the kernel should confirm that we have booted with Secure boot:
(initramfs) dmesg   grep secureboot
[    0.000000] secureboot: Secure boot enabled

13 March 2023

Russell Coker: Firebuild

After reading B lint s blog post about Firebuild (a compile cache) [1] I decided to give it a go. It s non-free, the project web site [2] says that it s free for non-commercial use or commercial trials. My first attempt at building a Debian package failed due to man-recode using a seccomp() sandbox, I filed Debian bug #1032619 [3] about this (thanks for the quick response B lint). The solution for me was to edit /etc/firebuild.conf and add man-recode to the dont_intercept list. The new version that s just been uploaded to Debian fixes it by disabling seccomp() and will presumably allow slightly better performance. Here are the results of building the refpolicy package with Firebuild, a regular build, the first build with Firebuild (30% slower) and a rebuild with Firebuild that reduced the time by almost 42%.
real    1m32.026s
user    4m20.200s
sys     2m33.324s
real    2m4.111s
user    6m31.769s
sys     3m53.681s
real    0m53.632s
user    1m41.334s
sys     3m36.227s
Next I did a test of building a Linux 6.1.10 kernel with make bzImage -j18 , here are the results from a normal build, first build with firebuild, and second build. The real time is worse with firebuild for this on my machine. I think that the relative speeds of my CPU (reasonably fast 18 core) and storage (two of the slower NVMe devices in a BTRFS RAID-1) is the cause of the first build being relatively so much slower for make bzImage than for building the refpolicy, as the kernel build process involves a lot more data. For the final build I moved ~/.cache/firebuild to a tmpfs (I have 128G of RAM and not much running on my machine at the time of the tests), even then building with firebuild was slightly slower in real time but took significantly less CPU time (user+real being 20mins instead of 36m). I also ran several tests with the kernel source tree on a tmpfs but for unknown reasons those tests each took about 6 minutes. Does firebuild or the Linux kernel build process dislike tmpfs for some reason?
real    2m43.020s
user    31m30.551s
sys     5m15.279s
real    8m49.675s
user    64m11.258s
sys     19m39.016s
real    3m6.858s
user    7m47.556s
sys     9m22.513s
real    2m51.910s
user    10m53.870s
sys     9m21.307s
One thing I noticed from the kernel build tests is that the total CPU time taken by the firebuild process (as reported by ps) was more than 2/3 of the run time and top usually reported it as taking around 75% of a CPU core. It seems to me that the firebuild process itself is a bottleneck on build speed. Building refpolicy without firebuild has an average of 4.5 cores in use while building the kernel haas 13.5. Unless they make a multi-threaded version of firebuild it seems that it won t give the performance one would hope for from a CPU with 18+ cores. I presume that if I had been running with hyper-threading enabled then firebuild would have been even worse for kernel builds as it would sometimes get on the second thread of a core. It looks like firebuild would perform better on AMD CPUs as they tend to have fewer CPU cores with greater average performance per core so a single CPU core for firebuild will be less limited. I presume that the firebuild developers will make it perform better with large numbers of cores in future, the latest Intel laptop CPUs have 16+ cores and servers with 2*40core CPUs are common. The performance improvement for refpolicy is significant as a portion of build time, but insignificant in terms of real time. A full build of refpolicy doesn t take enough time to get a Coke and reducing it doesn t offer a huge benefit, if Firebuild was available in past years when refpolicy took 20 minutes to build (when DDR2 was the best RAM available) then it would be a different story. There is some potential to optimise the build of refpolicy for the non-firebuild case. Getting it to average more than 4.5 cores in use when there s 18 available should be possible, there are a number of shell for loops in the main Makefile and maybe some of them can be replaced by make constructs to allow running in parallel. If it used 7 cores on average then it would be faster in a regular build than it currently is with firebuild and a hot cache. Any advice from make experts would be appreciated.

5 March 2023

Reproducible Builds: Reproducible Builds in February 2023

Welcome to the February 2023 report from the Reproducible Builds project. As ever, if you are interested in contributing to our project, please visit the Contribute page on our website.
FOSDEM 2023 was held in Brussels on the 4th & 5th of February and featured a number of talks related to reproducibility. In particular, Akihiro Suda gave a talk titled Bit-for-bit reproducible builds with Dockerfile discussing deterministic timestamps and deterministic apt-get (original announcement). There was also an entire track of talks on Software Bill of Materials (SBOMs). SBOMs are an inventory for software with the intention of increasing the transparency of software components (the US National Telecommunications and Information Administration (NTIA) published a useful Myths vs. Facts document in 2021).
On our mailing list this month, Larry Doolittle was puzzled why the Debian verilator package was not reproducible [ ], but Chris Lamb pointed out that this was due to the use of Python s datetime.fromtimestamp over datetime.utcfromtimestamp [ ].
James Addison also was having issues with a Debian package: in this case, the alembic package. Chris Lamb was also able to identify the Sphinx documentation generator as the cause of the problem, and provided a potential patch that might fix it. This was later filed upstream [ ].
Anthony Harrison wrote to our list twice, first by introducing himself and their background and later to mention the increasing relevance of Software Bill of Materials (SBOMs):
As I am sure everyone is aware, there is a growing interest in [SBOMs] as a way of improving software security and resilience. In the last two years, the US through the Exec Order, the EU through the proposed Cyber Resilience Act (CRA) and this month the UK has issued a consultation paper looking at software security and SBOMs appear very prominently in each publication. [ ]

Tim Retout wrote a blog post discussing AlmaLinux in the context of CentOS, RHEL and supply-chain security in general [ ]:
Alma are generating and publishing Software Bill of Material (SBOM) files for every package; these are becoming a requirement for all software sold to the US federal government. What s more, they are sending these SBOMs to a third party (CodeNotary) who store them in some sort of Merkle tree system to make it difficult for people to tamper with later. This should theoretically allow end users of the distribution to verify the supply chain of the packages they have installed?

Debian

F-Droid & Android

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb released versions 235 and 236; Mattia Rizzolo later released version 237. Contributions include:
  • Chris Lamb:
    • Fix compatibility with PyPDF2 (re. issue #331) [ ][ ][ ].
    • Fix compatibility with ImageMagick version 7.1 [ ].
    • Require at least version 23.1.0 to run the Black source code tests [ ].
    • Update debian/tests/control after merging changes from others [ ].
    • Don t write test data during a test [ ].
    • Update copyright years [ ].
    • Merged a large number of changes from others.
  • Akihiro Suda edited the .gitlab-ci.yml configuration file to ensure that versioned tags are pushed to the container registry [ ].
  • Daniel Kahn Gillmor provided a way to migrate from PyPDF2 to pypdf (#1029741).
  • Efraim Flashner updated the tool metadata for isoinfo on GNU Guix [ ].
  • FC Stegerman added support for Android resources.arsc files [ ], improved a number of file-matching regular expressions [ ][ ] and added support for Android dexdump [ ]; they also fixed a test failure (#1031433) caused by Debian s black package having been updated to a newer version.
  • Mattia Rizzolo:
    • updated the release documentation [ ],
    • fixed a number of Flake8 errors [ ][ ],
    • updated the autopkgtest configuration to only install aapt and dexdump on architectures where they are available [ ], making sure that the latest diffoscope release is in a good fit for the upcoming Debian bookworm freeze.

reprotest Reprotest version 0.7.23 was uploaded to both PyPI and Debian unstable, including the following changes:
  • Holger Levsen improved a lot of documentation [ ][ ][ ], tidied the documentation as well [ ][ ], and experimented with a new --random-locale flag [ ].
  • Vagrant Cascadian adjusted reprotest to no longer randomise the build locale and use a UTF-8 supported locale instead [ ] (re. #925879, #1004950), and to also support passing --vary=locales.locale=LOCALE to specify the locale to vary [ ].
Separate to this, Vagrant Cascadian started a thread on our mailing list questioning the future development and direction of reprotest.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In February, the following changes were made by Holger Levsen:
  • Add three new OSUOSL nodes [ ][ ][ ] and decommission the osuosl174 node [ ].
  • Change the order of listed Debian architectures to show the 64-bit ones first [ ].
  • Reduce the frequency that the Debian package sets and dd-list HTML pages update [ ].
  • Sort Tested suite consistently (and Debian unstable first) [ ].
  • Update the Jenkins shell monitor script to only query disk statistics every 230min [ ] and improve the documentation [ ][ ].

Other development work disorderfs version 0.5.11-3 was uploaded by Holger Levsen, fixing a number of issues with the manual page [ ][ ][ ].
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.
If you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. You can get in touch with us via:

28 February 2023

Paul Wise: FLOSS Activities Feb 2023

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian BTS: unarchive/reopen/triage bugs for reintroduced package servefile
  • Debian IRC: turn an old channel into a redirect to the right one
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The pyemd/sptag work was sponsored. All other work was done on a volunteer basis.

9 February 2023

Jonathan McDowell: Building a read-only Debian root setup: Part 2

This is the second part of how I build a read-only root setup for my router. You might want to read part 1 first, which covers the initial boot and general overview of how I tie the pieces together. This post will describe how I build the squashfs image that forms the main filesystem. Most of the build is driven from a script, make-router, which I ll dissect below. It s highly tailored to my needs, and this is a fairly lengthy post, but hopefully the steps I describe prove useful to anyone trying to do something similar.
Breakdown of make-router
#!/bin/bash
# Either rb3011 (arm) or rb5009 (arm64)
#HOSTNAME="rb3011"
HOSTNAME="rb5009"
if [ "x$ HOSTNAME " == "xrb3011" ]; then
	ARCH=armhf
elif [ "x$ HOSTNAME " == "xrb5009" ]; then
	ARCH=arm64
else
	echo "Unknown host: $ HOSTNAME "
	exit 1
fi

It s a bash script, and I allow building for either my RB3011 or RB5009, which means a different architecture (32 vs 64 bit). I run this script on my Pi 4 which means I don t have to mess about with QemuUserEmulation.
BASE_DIR=$(dirname $0)
IMAGE_FILE=$(mktemp --tmpdir router.$ ARCH .XXXXXXXXXX.img)
MOUNT_POINT=$(mktemp -p /mnt -d router.$ ARCH .XXXXXXXXXX)
# Build and mount an ext4 image file to put the root file system in
dd if=/dev/zero bs=1 count=0 seek=1G of=$ IMAGE_FILE 
mkfs -t ext4 $ IMAGE_FILE 
mount -o loop $ IMAGE_FILE  $ MOUNT_POINT 

I build the image in a loopback ext4 file on tmpfs (my Pi4 is the 8G model), which makes things a bit faster.
# Add dpkg excludes
mkdir -p $ MOUNT_POINT /etc/dpkg/dpkg.cfg.d/
cat <<EOF > $ MOUNT_POINT /etc/dpkg/dpkg.cfg.d/path-excludes
# Exclude docs
path-exclude=/usr/share/doc/*
# Only locale we want is English
path-exclude=/usr/share/locale/*
path-include=/usr/share/locale/en*/*
path-include=/usr/share/locale/locale.alias
# No man pages
path-exclude=/usr/share/man/*
EOF

Create a dpkg excludes config to drop docs, man pages and most locales before we even start the bootstrap.
# Setup fstab + mtab
echo "# Empty fstab as root is pre-mounted" > $ MOUNT_POINT /etc/fstab
ln -s ../proc/self/mounts $ MOUNT_POINT /etc/mtab
# Setup hostname
echo $ HOSTNAME  > $ MOUNT_POINT /etc/hostname
# Add the root SSH keys
mkdir -p $ MOUNT_POINT /root/.ssh/
cat <<EOF > $ MOUNT_POINT /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAv8NkUeVdsVdegS+JT9qwFwiHEgcC9sBwnv6RjpH6I4d3im4LOaPOatzneMTZlH8Gird+H4nzluciBr63hxmcFjZVW7dl6mxlNX2t/wKvV0loxtEmHMoI7VMCnrWD0PyvwJ8qqNu9cANoYriZRhRCsBi27qPNvI741zEpXN8QQs7D3sfe4GSft9yQplfJkSldN+2qJHvd0AHKxRdD+XTxv1Ot26+ZoF3MJ9MqtK+FS+fD9/ESLxMlOpHD7ltvCRol3u7YoaUo2HJ+u31l0uwPZTqkPNS9fkmeCYEE0oXlwvUTLIbMnLbc7NKiLgniG8XaT0RYHtOnoc2l2UnTvH5qsQ== noodles@earth.li
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDQb9+qFemcwKhey3+eTh5lxp+3sgZXW2HQQEZMt9hPvVXk+MiiNMx9WUzxPJnwXqlmmVdKsq+AvjA0i505Pp8fIj5DdUBpSqpLghmzpnGuob7SSwXYj+352hjD52UC4S0KMKbIaUpklADgsCbtzhYYc4WoO8F7kK63tS5qa1XSZwwRwPbYOWBcNocfr9oXCVWD9ismO8Y0l75G6EyW8UmwYAohDaV83pvJxQerYyYXBGZGY8FNjqVoOGMRBTUcLj/QTo0CDQvMtsEoWeCd0xKLZ3gjiH3UrknkaPra557/TWymQ8Oh15aPFTr5FvKgAlmZaaM0tP71SOGmx7GpCsP4jZD1Xj/7QMTAkLXb+Ou6yUOVM9J4qebdnmF2RGbf1bwo7xSIX6gAYaYgdnppuxqZX1wyAy+A2Hie4tUjMHKJ6OoFwBsV1sl+3FobrPn6IuulRCzsq2aLqLey+PHxuNAYdSKo7nIDB3qCCPwHlDK52WooSuuMidX4ujTUw7LDTia9FxAawudblxbrvfTbg3DsiDBAOAIdBV37HOAKu3VmvYSPyqT80DEy8KFmUpCEau59DID9VERkG6PWPVMiQnqgW2Agn1miOBZeIQV8PFjenAySxjzrNfb4VY/i/kK9nIhXn92CAu4nl6D+VUlw+IpQ8PZlWlvVxAtLonpjxr9OTw== noodles@yubikey
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC0I8UHj4IpfqUcGE4cTvLB0d2xmATSUzqtxW6ZhGbZxvQDKJesVW6HunrJ4NFTQuQJYgOXY/o82qBpkEKqaJMEFHTCjcaj3M6DIaxpiRfQfs0nhtzDB6zPiZn9Suxb0s5Qr4sTWd6iI9da72z3hp9QHNAu4vpa4MSNE+al3UfUisUf4l8TaBYKwQcduCE0z2n2FTi3QzmlkOgH4MgyqBBEaqx1tq7Zcln0P0TYZXFtrxVyoqBBIoIEqYxmFIQP887W50wQka95dBGqjtV+d8IbrQ4pB55qTxMd91L+F8n8A6nhQe7DckjS0Xdla52b9RXNXoobhtvx9K2prisagsHT noodles@cup
ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBK6iGog3WbNhrmrkglNjVO8/B6m7mN6q1tMm1sXjLxQa+F86ETTLiXNeFQVKCHYrk8f7hK0d2uxwgj6Ixy9k0Cw= noodles@sevai
EOF

Setup fstab, the hostname and SSH keys for root.
# Bootstrap our install
debootstrap \
	--arch=$ ARCH  \
	--include=collectd-core,conntrack,dnsmasq,ethtool,iperf3,kexec-tools,mosquitto,mtd-utils,mtr-tiny,ppp,tcpdump,rng-tools5,ssh,watchdog,wget \
	--exclude=dmidecode,isc-dhcp-client,isc-dhcp-common,makedev,nano \
	bullseye $ MOUNT_POINT  https://deb.debian.org/debian/

Actually do the debootstrap step, including a bunch of extra packages that we want.
# Install mqtt-arp
cp $ BASE_DIR /debs/mqtt-arp_1_$ ARCH .deb $ MOUNT_POINT /tmp
chroot $ MOUNT_POINT  dpkg -i /tmp/mqtt-arp_1_$ ARCH .deb
rm $ MOUNT_POINT /tmp/mqtt-arp_1_$ ARCH .deb
# Frob the mqtt-arp config so it starts after mosquitto
sed -i -e 's/After=.*/After=mosquitto.service/' $ MOUNT_POINT /lib/systemd/system/mqtt-arp.service

I haven t uploaded mqtt-arp to Debian, so I install a locally built package, and ensure it starts after mosquitto (the MQTT broker), given they re running on the same host.
# Frob watchdog so it starts earlier than multi-user
sed -i -e 's/After=.*/After=basic.target/' $ MOUNT_POINT /lib/systemd/system/watchdog.service
# Make sure the watchdog is poking the device file
sed -i -e 's/^#watchdog-device/watchdog-device/' $ MOUNT_POINT /etc/watchdog.conf

watchdog timeouts were particularly an issue on the RB3011, where the default timeout didn t give enough time to reach multiuser mode before it would reset the router. Not helpful, so alter the config to start it earlier (and make sure it s configured to actually kick the device file).
# Clean up docs + locales
rm -r $ MOUNT_POINT /usr/share/doc/*
rm -r $ MOUNT_POINT /usr/share/man/*
for dir in $ MOUNT_POINT /usr/share/locale/*/; do
	if [ "$ dir " != "$ MOUNT_POINT /usr/share/locale/en/" ]; then
		rm -r $ dir 
	fi
done

Clean up any docs etc that ended up installed.
# Set root password to root
echo "root:root"   chroot $ MOUNT_POINT  chpasswd

The only login method is ssh key to the root account though I suppose this allows for someone to execute a privilege escalation from a daemon user so I should probably randomise this. Does need to be known though so it s possible to login via the serial console for debugging.
# Add security to sources.list + update
echo "deb https://security.debian.org/debian-security bullseye-security main" >> $ MOUNT_POINT /etc/apt/sources.list
chroot $ MOUNT_POINT  apt update
chroot $ MOUNT_POINT  apt -y full-upgrade
chroot $ MOUNT_POINT  apt clean
# Cleanup the APT lists
rm $ MOUNT_POINT /var/lib/apt/lists/www.*
rm $ MOUNT_POINT /var/lib/apt/lists/security.*

Pull in any security updates, then clean out the APT lists rather than polluting the image with them.
# Disable the daily APT timer
rm $ MOUNT_POINT /etc/systemd/system/timers.target.wants/apt-daily.timer
# Disable daily dpkg backup
cat <<EOF > $ MOUNT_POINT /etc/cron.daily/dpkg
#!/bin/sh
# Don't do the daily dpkg backup
exit 0
EOF
# We don't want a persistent systemd journal
rmdir $ MOUNT_POINT /var/log/journal

None of these make sense on a router.
# Enable nftables
ln -s /lib/systemd/system/nftables.service \
	$ MOUNT_POINT /etc/systemd/system/sysinit.target.wants/nftables.service

Ensure we have firewalling enabled automatically.
# Add systemd-coredump + systemd-timesync user / group
echo "systemd-timesync:x:998:" >> $ MOUNT_POINT /etc/group
echo "systemd-coredump:x:999:" >> $ MOUNT_POINT /etc/group
echo "systemd-timesync:!*::" >> $ MOUNT_POINT /etc/gshadow
echo "systemd-coredump:!*::" >> $ MOUNT_POINT /etc/gshadow
echo "systemd-timesync:x:998:998:systemd Time Synchronization:/:/usr/sbin/nologin" >> $ MOUNT_POINT /etc/passwd
echo "systemd-coredump:x:999:999:systemd Core Dumper:/:/usr/sbin/nologin" >> $ MOUNT_POINT /etc/passwd
echo "systemd-timesync:!*:47358::::::" >> $ MOUNT_POINT /etc/shadow
echo "systemd-coredump:!*:47358::::::" >> $ MOUNT_POINT /etc/shadow
# Create /etc/.pwd.lock, otherwise it'll end up in the overlay
touch $ MOUNT_POINT /etc/.pwd.lock
chmod 600 $ MOUNT_POINT /etc/.pwd.lock

Create a number of users that will otherwise get created at boot, and a lock file that will otherwise get created anyway.
# Copy config files
cp --recursive --preserve=mode,timestamps $ BASE_DIR /etc/* $ MOUNT_POINT /etc/
cp --recursive --preserve=mode,timestamps $ BASE_DIR /etc-$ ARCH /* $ MOUNT_POINT /etc/
chroot $ MOUNT_POINT  chown mosquitto /etc/mosquitto/mosquitto.users
chroot $ MOUNT_POINT  chown mosquitto /etc/ssl/mqtt.home.key

There are config files that are easier to replace wholesale, some of which are specific to the hardware (e.g. related to network interfaces). See below for some more details.
# Build symlinks into flash for boot / modules
ln -s /mnt/flash/lib/modules $ MOUNT_POINT /lib/modules
rmdir $ MOUNT_POINT /boot
ln -s /mnt/flash/boot $ MOUNT_POINT /boot

The kernel + its modules live outside the squashfs image, on the USB flash drive that the image lives on. That makes for easier kernel upgrades.
# Put our git revision into os-release
echo -n "GIT_VERSION=" >> $ MOUNT_POINT /etc/os-release
(cd $ BASE_DIR  ; git describe --tags) >> $ MOUNT_POINT /etc/os-release

Always helpful to be able to check the image itself for what it was built from.
# Add some stuff to root's .bashrc
cat << EOF >> $ MOUNT_POINT /root/.bashrc
alias ls='ls -F --color=auto'
eval "\$(dircolors)"
case "\$TERM" in
xterm* rxvt*)
	PS1="\\[\\e]0;\\u@\\h: \\w\a\\]\$PS1"
	;;
*)
	;;
esac
EOF

Just some niceties for when I do end up logging in.
# Build the squashfs
mksquashfs $ MOUNT_POINT  /tmp/router.$ ARCH .squashfs \
	-comp xz

Actually build the squashfs image.
# Save the installed package list off
chroot $ MOUNT_POINT  dpkg --get-selections > /tmp/wip-installed-packages

Save off the installed package list. This was particularly useful when trying to replicate the existing router setup and making sure I had all the important packages installed. It doesn t really serve a purpose now.
In terms of the config files I copy into /etc, shared across both routers are the following:
Breakdown of shared config
  • apt config (disable recommends, periodic updates):
    • apt/apt.conf.d/10periodic, apt/apt.conf.d/local-recommends
  • Adding a default, empty, locale:
    • default/locale
  • DNS/DHCP:
    • dnsmasq.conf, dnsmasq.d/dhcp-ranges, dnsmasq.d/static-ips
    • hosts, resolv.conf
  • Enabling IP forwarding:
    • sysctl.conf
  • Logs related:
    • logrotate.conf, rsyslog.conf
  • MQTT related:
    • mosquitto/mosquitto.users, mosquitto/conf.d/ssl.conf, mosquitto/conf.d/users.conf, mosquitto/mosquitto.acl, mosquitto/mosquitto.conf
    • mqtt-arp.conf
    • ssl/lets-encrypt-r3.crt, ssl/mqtt.home.key, ssl/mqtt.home.crt
  • PPP configuration:
    • ppp/ip-up.d/0000usepeerdns, ppp/ipv6-up.d/defaultroute, ppp/pap-secrets, ppp/chap-secrets
    • network/interfaces.d/pppoe-wan
The router specific config is mostly related to networking:
Breakdown of router specific config
  • Firewalling:
    • nftables.conf
  • Interfaces:
    • dnsmasq.d/interfaces
    • network/interfaces.d/eth0, network/interfaces.d/p1, network/interfaces.d/p2, network/interfaces.d/p7, network/interfaces.d/p8
  • PPP config (network interface piece):
    • ppp/peers/aquiss
  • SSH keys:
    • ssh/ssh_host_ecdsa_key, ssh/ssh_host_ed25519_key, ssh/ssh_host_rsa_key, ssh/ssh_host_ecdsa_key.pub, ssh/ssh_host_ed25519_key.pub, ssh/ssh_host_rsa_key.pub
  • Monitoring:
    • collectd/collectd.conf, collectd/collectd.conf.d/network.conf

8 February 2023

Stephan Lachnit: Setting up fast Debian package builds using sbuild, mmdebstrap and apt-cacher-ng

In this post I will give a quick tutorial on how to set up fast Debian package builds using sbuild with mmdebstrap and apt-cacher-ng. The usual tool for building Debian packages is dpkg-buildpackage, or a user-friendly wrapper like debuild, and while these are geat tools, if you want to upload something to the Debian archive they lack the required separation from the system they are run on to ensure that your packaging also works on a different system. The usual candidate here is sbuild. But setting up a schroot is tedious and performance tuning can be annoying. There is an alternative backend for sbuild that promises to make everything simpler: unshare. In this tutorial I will show you how to set up sbuild with this backend. Additionally to the normal performance tweaking, caching downloaded packages can be a huge performance increase when rebuilding packages. I do rebuilds quite often, mostly when a new dependency got introduced I didn t specify in debian/control yet or lintian notices a something I can easily fix. So let s begin with setting up this caching.

Setting up apt-cacher-ng Install apt-cacher-ng:
sudo apt install apt-cacher-ng
A pop-up will appear, if you are unsure how to answer it select no, we don t need it for this use-case. To enable apt-cacher-ng on your system, create /etc/apt/apt.conf.d/02proxy and insert:
Acquire::http::proxy "http://127.0.0.1:3142";
Acquire::https::proxy "DIRECT";
In /etc/apt-cacher-ng/acng.conf you can increase the value of ExThreshold to hold packages for a shorter or longer duration. The length depends on your specific use case and resources. A longer threshold takes more disk space, a short threshold like one day effecitvely only reduces the build time for rebuilds. If you encounter weird issues on apt update at some point the future, you can try to clean the cache from apt-cacher-ng. You can use this script:

Setting up mmdebstrap Install mmdebstrap:
sudo apt install mmdebstrap
We will create a small helper script to ease creating a chroot. Open ~/.local/bin/mmupdate and insert:
#!/bin/sh
mmdebstrap \
  --variant=buildd \
  --aptopt='Acquire::http::proxy "http://127.0.0.1:3142";' \
  --arch=amd64 \
  --components=main,contrib,non-free \
  unstable \
  ~/.cache/sbuild/unstable-amd64.tar.xz \
  http://deb.debian.org/debian
Notes:
  • aptopt enables apt-cacher-ng inside the chroot.
  • --arch sets the CPU architecture (see Debian Wiki).
  • --components sets the archive components, if you don t want non-free pacakges you might want to remove some entries here.
  • unstable sets the Debian release, you can also set for example bookworm-backports here.
  • unstable-amd64.tar.xz is the output tarball containing the chroot, change accordingly to your pick of the CPU architecture and Debian release.
  • http://deb.debian.org/debian is the Debian mirror, you should set this to the same one you use in your /etc.apt/sources.list.
Make mmupdate executable and run it once:
chmod +x ~/.local/bin/mmupdate
mkdir -p ~/.cache/sbuild
~/.local/bin/mmupdate
If you execute mmupdate again you can see that the downloading stage is much faster thanks to apt-cacher-ng. For me the difference is from about 115s to about 95s. Your results may vary, this depends on the speed of your internet, Debian mirror and disk. If you have used the schroot backend and sbuild-update before, you probably notice that creating a new chroot with mmdebstrap is slower. It would be a bit annoying to do this manually before we start a new Debian packaging session, so let s create a systemd service that does this for us. First create a folder for user services:
mkdir -p ~/.config/systemd/user
Create ~/.config/systemd/user/mmupdate.service and add:
[Unit]
Description=Run mmupdate
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=%h/.local/bin/mmupdate
Start the service and test that it works:
systemctl --user daemon-reload
systemctl --user start mmupdate
systemctl --user status mmupdate
Create ~/.config/systemd/user/mmupdate.timer:
[Unit]
Description=Run mmupdate daily
[Timer]
OnCalendar=daily
Persistent=true
[Install]
WantedBy=timers.target
Enable the timer:
systemctl --user enable mmupdate.timer
Now every day mmupdte will be run automatically. You can adjust the period if you think daily rebuilds are a bit excessive. A neat advantage of period rebuilds is that they the base files in your apt-cacher-ng cache warm every time they run.

Setting up sbuild: Install sbuild and (optionally) autopkgtest:
sudo apt install --no-install-recommends sbuild autopkgtest
Create ~/.sbuildrc and insert:
# backend for using mmdebstrap chroots
$chroot_mode = 'unshare';
# build in tmpfs
$unshare_tmpdir_template = '/dev/shm/tmp.sbuild.XXXXXXXX';
# upgrade before starting build
$apt_update = 1;
$apt_upgrade = 1;
# build everything including source for source-only uploads
$build_arch_all = 1;
$build_arch_any = 1;
$build_source = 1;
$source_only_changes = 1;
# go to shell on failure instead of exiting
$external_commands =   "build-failed-commands" => [ [ '%SBUILD_SHELL' ] ]  ;
# always clean build dir, even on failure
$purge_build_directory = "always";
# run lintian
$run_lintian = 1;
$lintian_opts = [ '-i', '-I', '-E', '--pedantic' ];
# do not run piuparts
$run_piuparts = 0;
# run autopkgtest
$run_autopkgtest = 1;
$autopkgtest_root_args = '';
$autopkgtest_opts = [ '--apt-upgrade', '--', 'unshare', '--release', '%r', '--arch', '%a', '--prefix=/dev/shm/tmp.autopkgtest.' ];
# set uploader for correct signing
$uploader_name = 'Stephan Lachnit <stephanlachnit@debian.org>';
You should adjust uploader_name. If you don t want to run autopkgtest or lintian by default you can also disable it here. Note that for packages that need a lot of space for building, you might want to comment the unshare_tmpdir_template line to prevent a OOM build failure. You can now build your Debian packages with the sbuild command :)

Finishing touches You can add these variables to your ~/.bashrc as bonus (with adjusted name / email):
export DEBFULLNAME="<your_name>"
export DEBEMAIL="<your_email>"
export DEB_BUILD_OPTIONS="parallel=<threads>"
In particular adjust the value of parallel to ensure parallel builds. If you are new to signing / uploading your package, first install the required tools:
sudo apt install devscripts dput-ng
Create ~/.devscripts and insert:
DEBSIGN_KEYID=<your_gpg_fingerpring>
USCAN_SYMLINK=rename
You can now sign the .changes file with:
debsign ../<pkgname_version_arch>.changes
And for source-only uploads with:
debsign -S ../<pkgname_version_arch>_source.changes
If you don t introduce a new binary package, you always want to go with source-only changes. You can now upload the package to Debian with
dput ../<filename>.changes

Update Feburary 22nd Jochen Sprickerhof, who originally advised me to use the unshare backend, commented that one can also use --include=auto-apt-proxy instead of the --aptopt option in mmdebstrap to detect apt proxies automatically. He also let me know that it is possible to use autopkgtest on tmpfs (config in the blog post is updated) and added an entry on the sbuild wiki page on how to setup sbuild+unshare with ccache if you often need to build a large package. Further, using --variant=apt and --include=build-essential will produce smaller build chroots if wished. On the contrary, one can of course also use the --include option to include debhelper and lintian (or any other packages you like) to further decrease the setup time. However, staying with buildd variant is a good choice for official uploads.

Resources for further reading https://wiki.debian.org/sbuild
https://www.unix-ag.uni-kl.de/~bloch/acng/html/index.html
https://wiki.ubuntu.com/SimpleSbuild
https://wiki.archlinux.org/title/Systemd/Timers
https://manpages.debian.org/unstable/autopkgtest/autopkgtest-virt-unshare.1.en.html
Thanks for reading!

6 February 2023

Reproducible Builds: Reproducible Builds in January 2023

Welcome to the first report for 2023 from the Reproducible Builds project! In these reports we try and outline the most important things that we have been up to over the past month, as well as the most important things in/around the community. As a quick recap, the motivation behind the reproducible builds effort is to ensure no malicious flaws can be deliberately introduced during compilation and distribution of the software that we run on our devices. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.


News In a curious turn of events, GitHub first announced this month that the checksums of various Git archives may be subject to change, specifically that because:
the default compression for Git archives has recently changed. As result, archives downloaded from GitHub may have different checksums even though the contents are completely unchanged.
This change (which was brought up on our mailing list last October) would have had quite wide-ranging implications for anyone wishing to validate and verify downloaded archives using cryptographic signatures. However, GitHub reversed this decision, updating their original announcement with a message that We are reverting this change for now. More details to follow. It appears that this was informed in part by an in-depth discussion in the GitHub Community issue tracker.
The Bundesamt f r Sicherheit in der Informationstechnik (BSI) (trans: The Federal Office for Information Security ) is the agency in charge of managing computer and communication security for the German federal government. They recently produced a report that touches on attacks on software supply-chains (Supply-Chain-Angriff). (German PDF)
Contributor Seb35 updated our website to fix broken links to Tails Git repository [ ][ ], and Holger updated a large number of pages around our recent summit in Venice [ ][ ][ ][ ].
Noak J nsson has written an interesting paper entitled The State of Software Diversity in the Software Supply Chain of Ethereum Clients. As the paper outlines:
In this report, the software supply chains of the most popular Ethereum clients are cataloged and analyzed. The dependency graphs of Ethereum clients developed in Go, Rust, and Java, are studied. These client are Geth, Prysm, OpenEthereum, Lighthouse, Besu, and Teku. To do so, their dependency graphs are transformed into a unified format. Quantitative metrics are used to depict the software supply chain of the blockchain. The results show a clear difference in the size of the software supply chain required for the execution layer and consensus layer of Ethereum.

Yongkui Han posted to our mailing list discussing making reproducible builds & GitBOM work together without gitBOM-ID embedding. GitBOM (now renamed to OmniBOR) is a project to enable automatic, verifiable artifact resolution across today s diverse software supply-chains [ ]. In addition, Fabian Keil wrote to us asking whether anyone in the community would be at Chemnitz Linux Days 2023, which is due to take place on 11th and 12th March (event info). Separate to this, Akihiro Suda posted to our mailing list just after the end of the month with a status report of bit-for-bit reproducible Docker/OCI images. As Akihiro mentions in their post, they will be giving a talk at FOSDEM in the Containers devroom titled Bit-for-bit reproducible builds with Dockerfile and that my talk will also mention how to pin the apt/dnf/apk/pacman packages with my repro-get tool.
The extremely popular Signal messenger app added upstream support for the SOURCE_DATE_EPOCH environment variable this month. This means that release tarballs of the Signal desktop client do not embed nondeterministic release information. [ ][ ]

Distribution work

F-Droid & Android There was a very large number of changes in the F-Droid and wider Android ecosystem this month: On January 15th, a blog post entitled Towards a reproducible F-Droid was published on the F-Droid website, outlining the reasons why F-Droid signs published APKs with its own keys and how reproducible builds allow using upstream developers keys instead. In particular:
In response to [ ] criticisms, we started encouraging new apps to enable reproducible builds. It turns out that reproducible builds are not so difficult to achieve for many apps. In the past few months we ve gotten many more reproducible apps in F-Droid than before. Currently we can t highlight which apps are reproducible in the client, so maybe you haven t noticed that there are many new apps signed with upstream developers keys.
(There was a discussion about this post on Hacker News.) In addition:
  • F-Droid added 13 apps published with reproducible builds this month. [ ]
  • FC Stegerman outlined a bug where baseline.profm files are nondeterministic, developed a workaround, and provided all the details required for a fix. As they note, this issue has now been fixed but the fix is not yet part of an official Android Gradle plugin release.
  • GitLab user Parwor discovered that the number of CPU cores can affect the reproducibility of .dex files. [ ]
  • FC Stegerman also announced the 0.2.0 and 0.2.1 releases of reproducible-apk-tools, a suite of tools to help make .apk files reproducible. Several new subcommands and scripts were added, and a number of bugs were fixed as well [ ][ ]. They also updated the F-Droid website to improve the reproducibility-related documentation. [ ][ ]
  • On the F-Droid issue tracker, FC Stegerman discussed reproducible builds with one of the developers of the Threema messenger app and reported that Android SDK build-tools 31.0.0 and 32.0.0 (unlike earlier and later versions) have a zipalign command that produces incorrect padding.
  • A number of bugs related to reproducibility were discovered in Android itself. Firstly, the non-deterministic order of .zip entries in .apk files [ ] and then newline differences between building on Windows versus Linux that can make builds not reproducible as well. [ ] (Note that these links may require a Google account to view.)
  • And just before the end of the month, FC Stegerman started a thread on our mailing list on the topic of hiding data/code in APK embedded signatures which has been made possible by the Android APK Signature Scheme v2/v3. As part of this, they made an Android app that reads the APK Signing block of its own APK and extracts a payload in order to alter its behaviour called sigblock-code-poc.

Debian As mentioned in last month s report, Vagrant Cascadian has been organising a series of online sprints in order to clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). During January, a sprint took place on the 10th, resulting in the following uploads: During this sprint, Holger Levsen filed Debian bug #1028615 to request that the tracker.debian.org service display results of reproducible rebuilds, not just reproducible CI results. Elsewhere in Debian, strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, version 1.13.1-1 was uploaded to Debian unstable by Holger Levsen, including a fix by FC Stegerman (obfusk) to update a regular expression for the latest version of file(1) [ ]. (#1028892) Lastly, 65 reviews of Debian packages were added, 21 were updated and 35 were removed this month adding to our knowledge about identified issues.

Other distributions In other distributions:

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 231, 232, 233 and 234 to Debian:
  • No need for from __future__ import print_function import anymore. [ ]
  • Comment and tidy the extras_require.json handling. [ ]
  • Split inline Python code to generate test Recommends into a separate Python script. [ ]
  • Update debian/tests/control after merging support for PyPDF support. [ ]
  • Correctly catch segfaulting cd-iccdump binary. [ ]
  • Drop some old debugging code. [ ]
  • Allow ICC tests to (temporarily) fail. [ ]
In addition, FC Stegerman (obfusk) made a number of changes, including:
  • Updating the test_text_proper_indentation test to support the latest version(s) of file(1). [ ]
  • Use an extras_require.json file to store some build/release metadata, instead of accessing the internet. [ ]
  • Updating an APK-related file(1) regular expression. [ ]
  • On the diffoscope.org website, de-duplicate contributors by e-mail. [ ]
Lastly, Sam James added support for PyPDF version 3 [ ] and Vagrant Cascadian updated a handful of tool references for GNU Guix. [ ][ ]

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, the following changes were made by Holger Levsen:
  • Node changes:
  • Debian-related changes:
    • Only keep diffoscope s HTML output (ie. no .json or .txt) for LTS suites and older in order to save diskspace on the Jenkins host. [ ]
    • Re-create pbuilder base less frequently for the stretch, bookworm and experimental suites. [ ]
  • OpenWrt-related changes:
    • Add gcc-multilib to OPENWRT_HOST_PACKAGES and install it on the nodes that need it. [ ]
    • Detect more problems in the health check when failing to build OpenWrt. [ ]
  • Misc changes:
    • Update the chroot-run script to correctly manage /dev and /dev/pts. [ ][ ][ ]
    • Update the Jenkins shell monitor script to collect disk stats less frequently [ ] and to include various directory stats. [ ][ ]
    • Update the real year in the configuration in order to be able to detect whether a node is running in the future or not. [ ]
    • Bump copyright years in the default page footer. [ ]
In addition, Christian Marangi submitted a patch to build OpenWrt packages with the V=s flag to enable debugging. [ ]
If you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. You can get in touch with us via:

5 February 2023

Russell Coker: Wayland in Bookworm

We are getting towards the freeze for Debian/Bookworm so the current state of packages isn t going to change much before the release. Bugs will get fixed but missing features will mostly be missing until the next release. Anarcat wrote an excellent blog post about using Wayland with the Sway window manager [1]. It seems pretty good if you like Sway, but I like KDE and plan to continue using it. Several of the important utility programs referenced by Anarcat won t run with KDE/Wayland and give errors such as Compositor doesn t support wlr-output-management-unstable-v1 . One noteworthy thing about Wayland is the the Window manager and the equivalent to the X server are the same program so KDE has different Wayland code than Sway and doesn t support some features. The lack of these features limits my ability to manage multiple displays and therefore makes KDE/Wayland unsuitable for many laptop uses. My work laptop runs Ubuntu 22.04 with KDE and wouldn t correctly display on the pair of monitors on a USB-C dock that s the standard desktop configuration where I work. In my previous post about Wayland [2] I wrote about converting 2 of my systems to Wayland. Since then I had changed them back to X because of problems with supporting strange monitor configurations on laptops and also due to the KDE window manager crashing occasionally which terminates the session in Wayland but merely requires restarting the window manager in X. More recently I had a problem with the GPU in my main workstation sometimes not being recognised by the system (reporting no PCIe device), when I got a new one I couldn t get X to work with the error Cannot run in framebuffer mode. Please specify busIDs for all framebuffer devices so I tried Wayland again. Now in the later stage of the Bookworm development process it seems that the problem with the KDE window manager crashing has been solved or mitigates and there is a new problem of the plasmashell process crashing. As I can restart plasmashell without logging out that s much less annoying. So now my main workstation is running on Wayland with a slower GPU than I previously had while also giving a faster user experience so Wayland is providing a definite performance benefit. Maybe for Trixie (the next release of Debian after Bookworm) we should have a release goal of having full Wayland support in all the major GUI systems.

30 January 2023

Arturo Borrero Gonz lez: Debian and the adventure of the screen resolution

Post logo I read somewhere a nice meme about Linux: Do you want an operating system or do you want an adventure? I love it, because it is so true. What you are about to read is my adventure to set a usable screen resolution in a fresh Debian testing installation. The context is that I have two different Lenovo Thinkpad laptops with 16 screen and nvidia graphic cards. They are both installed with the latest Debian testing. I use the closed-source nvidia drivers (they seem to work better than the nouveau module). The desktop manager and environment that I use is lightdm + XFCE4. The monitor native resolution in both machines is very high: 3840x2160 (or 4K UHD if you will). The thing is that both laptops show an identical problem: when freshly installed with the Debian default config, the native resolution is in use. For a 16 screen laptop, this high resolution means that the font is tiny. Therefore, the raw native resolution renders the machine almost unusable. This is a picture of what you get by running htop in the console (tty1, the terminal you would get by hitting CTRL+ALT+F1) with the default install: Linux tty console with high resolution and tiny fonts Everything in the system is affected by this:
  1. the grub menu is unreadable. Thanksfully the right option is selected by default.
  2. the tty console, with the boot splash by systemd is unreadable as well. There are some colors, so you at least see some systemd stuff happening in green .
  3. when lightdm starts, the resolution keeps being very high. Can barely click the login button.
  4. when XFCE4 starts, it is a pain to navigate the menu and click the right buttons to set a more reasonable resolution.
The adventure begins after installing the system. Each of these four points must be fixed by hand by the user. XFCE4 Point #4 is the easiest. Navigate with the mouse pointer to the tiny Applications menu, then Settings, then Displays. This is more or less the same in every other desktop operating system. There are no further actions required to persist this setting. Thanks you XFCE4. lightdm Point #3, about lightdm, is more tricky to solve. It involves running xrandr when lightdm sets up the display. Nobody will tell you this trick. You have to search for it on the internet. Thankfully is a common problem, and a person who knows what to search for can find good results. The file /etc/lightdm/lightdm.conf needs to contain something like this:
[LightDM]
[Seat:*]
# set up correct display resolution
display-setup-script=sh -c -- "xrandr -s 1920x1080"
By the way, depending on your system hardware setup, you may also need an additional call to xrandr here. If you want to plug in an HDMI monitor, chances are you require something like xrandr --setprovideroutputsource NVIDIA-G0 modesetting && xrandr --auto to instruct the NVIDIA graphic card to work will with the kernel graphic system. In my case, one of my laptops require it, so I have:
[LightDM]
[Seat:*]
# don't ask me to type my username
greeter-hide-users=false
# set up correct display resolution, and prepare NVIDIA card for HDMI output
display-setup-script=sh -c "xrandr -s 1920x1080 && xrandr --setprovideroutputsource NVIDIA-G0 modesetting && xrandr --auto"
grub Point #1 about the grub menu is also not trivial to solve, but also widely known on the internet. Grub allows you to set arbitrary graphical modes. In Debian systems, adding something like GRUB_GFXMODE=1024x768 to /etc/default/grub and then running sudo update-grub should do the magic. console So we get to point #2 about the tty1 console. For months, I ve been investing my scarce personal time into trying to solve this annoyance. There are a lot of conflicting information about this on the internet. Plenty of misleading solutions, essays about framebuffer, kernel modeset, and other stuff I don t want to read just to get my tty1 in a readable status. People point in different directions, like using GRUB_GFXPAYLOAD_LINUX=keep in /etc/default/grub. Which is a good solution, but won t work: my best bet is that the kernel indeed keeps the resolution as told by grub, but the moment systemd loads the nvidia driver, it enables 4K in the display and the console gets the high resolution. Actually, for a few weeks, I blamed plymouth. Because the plymouth service is loaded early by systemd, it could be responsible for setting some of the display settings. It actually contains some (undocummented) DeviceScale configuration option that is seemingly aimed to integrate into high resolution scenarios. I played with it to no avail. Some folks from IRC suggested reconfiguring the console-font package. Back-then unknown to me. Running sudo dpkg-reconfigure console-font would indeed show a menu to select some preferences for the console, including font size. But apparently, a freshly installed system already uses the biggest possible, so this was a dead end. Other option I evaluted for a few days was touching the kernel framebuffer setting. I honestly don t understand this, and all the solutions pointing to use fbset didn t work for me anyways. This is the default framebuffer configuration in one of the laptops:
user@debian:~$ fbset -i

mode "3840x2160"
    geometry 3840 2160 3840 2160 32
    timings 0 0 0 0 0 0 0
    accel true
    rgba 8/16,8/8,8/0,0/0
endmode
Frame buffer device information:
    Name        : i915drmfb
    Address     : 0
    Size        : 33177600
    Type        : PACKED PIXELS
    Visual      : TRUECOLOR
    XPanStep    : 1
    YPanStep    : 1
    YWrapStep   : 0
    LineLength  : 15360
    Accelerator : No
Playing with these numbers, I was able to modify the geometry of the console, only to reduce the panel to a tiny square in the console display (with equally small fonts anyway). If it was possible to scale or resize the panel in other way, I was unable to understand how to do so by reading the associated docs. One day, out of despair, I tried disabling kernel modesetting (or KMS). It indeed got me a more readable tty1, only to prevent the whole graphic stack from starting, with Xorg complaining about the lack of kernel modeset. After lots of wasted time, I decided to blame the NVIDIA graphic card. Because why not: a closed source module in my system looks fishy. I registered in their official forum and wrote a message about my suspicion on the module, asking for advice on how to modify the driver default resolution. I was hoping that something like modprobe nvidia my_desired_resolution=1920x1080 could exist. Apparently not :-( I was about to give up. I had walked every corner of the known internet. I even tried summoning the ancient gods, I used ChatGPT. I asked the AI god for mercy, for a working solution to no avail. Then I decided to change the kind of queries I was issuing the search engines (don t ask me, I no longer remember). Eventually I landed in this askubuntu.com page. The question described the exact same problem I was experiencing. Finally, that was encouraging! I was not alone in my adventure after all! The solution section included a font size I hadn t seen before in my previous tests: 16x32. More excitement! I did all the steps. I installed the xfonts-terminus package, and in the file /etc/default/console-setup I put:
ACTIVE_CONSOLES="/dev/tty[1-6]"
CHARMAP="ISO-8859-15"
CODESET="guess"
FONTFACE="Terminus"
FONTSIZE="16x32"
VIDEOMODE=
Then I run setupcon from a tty, and the miracle happened! I finally got a bigger font in the tty1 console! Turned out a potential solution was about playing with console-setup, which I had tried wihtout success before. I m not even sure if the additional package was required. This is how my console looks now: Linux tty console with high resolution but not so tiny fonts The truth is the solution is satisfying only to a degree. I m a person with good eyesight and can work with these bit larger fonts. I m not sure if I can get larger fonts using this method, honestly. After some search, I discovered that some folks already managed to describe the problem in detail and filed a proper bug report in Debian, see #595696 opened more than 10 years ago. 2023 is the year of linux on the desktop Nope. I honestly don t see how this disconnected pile of settings can be all reconciled together. Can we please have a systemd-whatever that homogeinizes all of this mess? I m referring to grub + kernel drivers + console + lightdm + XFCE4. Next adventure When I lock the desktop (with CTRL+ALT+L) and close the laptop lid to suspend it, then reopen it, type the login info into the lightdm greeter, then the desktop environment never loads, black screen. I have already tried the first few search results without luck. Perhaps the nvidia card is to blame this time? Perhaps poorly coupled power management by the different system software pieces? Who knows what s going on here. This will probably be my next Debian desktop adventure.

20 January 2023

Reproducible Builds (diffoscope): diffoscope 233 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 233. This version includes the following changes:
[ FC Stegerman ]
* Split packaging metadata into an extras_require.json file instead of using
  the pep517 and the pip modules directly. This was causing build failures if
  not using a virtualenv and/or building without internet access.
  (Closes: #1029066, reproducible-builds/diffoscope#325)
[ Vagrant Cascadian ]
* Add an external tool reference for GNU Guix (lzip).
* Drop an external tool reference for GNU Guix (pedump).
[ Chris Lamb ]
* Split inline Python code in shell script to generate test dependencies to a
  separate Python script.
* No need for "from __future__ import print_function" import in setup.py
  anymore.
* Comment and tidy the new extras_require.json handling.
You find out more by visiting the project homepage.

14 January 2023

Ian Jackson: SGO (and my) VPN and network access tools - in bookworm

Recently, we managed to get secnet and hippotat into Debian. They are on track to go into Debian bookworm. This completes in Debian the set of VPN/networking tools I (and other Greenend) folks have been using for many years. The Sinister Greenend Organisation s suite of network access tools consists mainly of: secnet secnet is our very mature VPN system. Its basic protocol idea is similar to that in Wireguard, but it s much older. Differences from Wireguard include: secnet was originally written by Stephen Early, starting in 1996 or so. I inherited it some years ago and have been maintaining it since. It s mostly written in C. Hippotat Hippotat is best described by copying the intro from the docs:
Hippotat is a system to allow you to use your normal VPN, ssh, and other applications, even in broken network environments that are only ever tested with web stuff . Packets are parcelled up into HTTP POST requests, resembling form submissions (or JavaScript XMLHttpRequest traffic), and the returned packets arrive via the HTTP response bodies.
It doesn t rely on TLS tunnelling so can work even if the local network is trying to intercept TLS. I recently rewrote Hippotat in Rust. userv ipif userv ipif is one of the userv utilities. It allows safe delegation of network routing to unprivileged users. The delegation is of a specific address range, so different ranges can be delegated to different users, and the authorised user cannot interfere with other traffic. This is used in the default configuration of hippotat packages, so that an ordinary user can start up the hippotat client as needed. On chiark userv-ipif is used to delegate networking to users, including administrators of allied VPN realms. So chiark actually runs at least 4 VPN-ish systems in production: secnet, hippotat, Mark Wooding s Tripe, and still a few links managed by the now-superseded udptunnel system. userv userv ipif is a userv service. That is, it is a facility which uses userv to bridge a privilege boundary. userv is perhaps my most under-appreciated program. userv can be used to straightforwardly bridge (local) privilege boundaries on Unix systems. So for example it can: userv services can be defined by the called user, not only by the system administrator. This allows a user to reconfigure or divert a system-provided default implementation, and even allows users to define and implement ad-hoc services of their own. (Although, the system administrator can override user config.) Acknowledgements Thanks for the help I had in this effort. In particular, thanks to Sean Whitton for encouragement, and the ftpmaster review; and to the Debian Rust Team for their help navigating the complexities of handling Rust packages within the Debian Rust Team workflow.

comment count unavailable comments

12 January 2023

Jonathan McDowell: Building a read-only Debian root setup: Part 1

I mentioned in the post about upgrading my home internet that part of the work I did was creating a read-only Debian root with a squashfs image. This post covers the details of how I boot with that image; a later post will cover how I build the squashfs image. First, David Reader kindly pointed me at his rodebian setup, which was helpful in making me think about the whole problem but ultimately not the direction I went. Primarily because on the old router (an RB3011) I am space constrained, with only 120M of usable flash, and so ideally I wanted as much as possible of the system in a well compressed filesystem. squashfs seemed like the best option for that, and ultimately I ended up with a 39M image. I ve then used overlayfs to mount a tmpfs, so I get what looks like a writeable system without having to do too many tweaks to the actual install. On the plus side I can then see exactly what is getting written where and decide whether I need to update something in the squashfs. I don t boot with an initrd - for initial testing I booted directly off a USB stick. I ve actually ended up continuing to do this in production, because I ve had no pressing reason to move it all to booting off internal flash (I ve ended up with a Sandisk SDCZ430-032G-G46 which is tiny). However nothing I m going to describe is dependent on that - this would work perfectly well for a initial UBIFS rootfs on internal NAND. So the basic overview is I boot off a minimal rootfs, mount a squashfs, create an appropriate tmpfs, mount an overlayfs that combines the two, then pivotroot into the overlayfs and exec its init so it becomes the rootfs. For the minimal rootfs I started with busybox, in particular I used the armhf busybox-static package from Debian. My RB5009 is an ARM64, but I wanted to be able to test on the RB3011 as well, which is ARMv7. Picking an armhf binary for the minimal rootfs lets me use the same image for both. Using the static build helps reduce the number of pieces involved in putting it all together. The busybox binary goes in /bin. I was able to cheat and chroot into the empty rootfs and call busybox --install -s to create symlinks for all the tools it provides, but I could have done this manually. There s only a handful that are actually needed, but it s amazing how much is crammed into a 1.2M binary. /sbin/init is a shell script:
Contents
#!/bin/ash
# Make sure we have a sane date
if [ -e /data/saved-date ]; then
        CURRENT_DATE=$(date -Iseconds)
        if [ "$ CURRENT_DATE:0:4 " -lt "2022" -o \
                        "$ CURRENT_DATE:0:4 " -gt "2030" ]; then
                echo Setting initial date
                date -s "$(cat /data/saved-date)"
        fi
fi
# Work out what platform we're on
ARCH=$(uname -m)
if [ "$ ARCH " == "aarch64" ]; then
        ARCH=arm64
else
        ARCH=armhf
fi
# Mount a tmpfs to store the changes
mount -t tmpfs root-rw /mnt/overlay/rw
# Make the directories we need in the tmpfs
mkdir /mnt/overlay/rw/upper
mkdir /mnt/overlay/rw/work
# Mount the squashfs and build an overlay root filesystem of it + the tmpfs
mount -t squashfs -o loop /data/router.$ ARCH .squashfs /mnt/overlay/lower
mount -t overlay \
        -o lowerdir=/mnt/overlay/lower,upperdir=/mnt/overlay/rw/upper,workdir=/mnt/overlay/rw/work \
        overlayfs-root /mnt/root
# Build the directories we need within the new root
mkdir /mnt/root/mnt/flash
mkdir /mnt/root/mnt/overlay
mkdir /mnt/root/mnt/overlay/lower
mkdir /mnt/root/mnt/overlay/rw
# Copy any stored state
if [ -e /data/state.$ ARCH .tar ]; then
        echo Restoring stored state
        cd /mnt/root
        tar xf /data/state.$ ARCH .tar
fi
cd /mnt/root
pivot_root . mnt/flash
echo Switching into root filesystem
exec chroot . sh -c "$(cat <<END
mount --move /mnt/flash/mnt/overlay/lower /mnt/overlay/lower
mount --move /mnt/flash/mnt/overlay/rw /mnt/overlay/rw
exec /sbin/init
END
)"
Most of what the script is doing is sorting out the squashfs + tmpfs backed overlayfs that becomes the full root filesystems, but there are a few other bits to note. First, we pick up a saved date from /data/saved-date - the router has no RTC and while it ll sort itself out with NTP once it gets networking up it s useful to make sure we don t end up comically far in the past or future. Second, the script looks at what architecture we re running and picks up an appropriate squashfs image from /data based on that. This let me use the same USB stick for testing on both the RB3011 and the RB5011. Finally we allow for a /data/state.$ ARCH .tar file to let us pick up changes to the rootfs at boot time - this prevents having to rebuild the squashfs image every time there s a persistent change. The other piece that doesn t show up in the script is that the kernel and its modules are all installed into this initial rootfs (and then symlinked from the squashfs). This lets me build a mostly modular kernel, as long as all the necessary drivers to mount the USB stick are built in. Once the system is fully booted the initial rootfs is available at /mnt/flash, by default mounted read-only (to avoid inadvertent writes), but able to be remounted to update the squashfs image, install a new kernel, or update the state tarball. /mnt/overlay/rw/upper/ is where updates to the overlayfs are written, which provides an easy way to see what files are changing, initially to determine what might need tweaked in the squashfs creation process and subsequently to be able to see what needs updated in the state tarball.

6 January 2023

Jonathan McDowell: Finally making use of bpftrace

I am old enough to remember when BPF meant the traditional Berkeley Packet Filter, and was confined to filtering network packets. It s grown into much, much, more as eBPF and getting familiar with it so that I can add it to the suite of tips and tricks I can call upon has been on my to-do list for a while. To this end I was lucky enough to attend a live walk through of bpftrace last year. bpftrace is a high level tool that allows the easy creation and execution of eBPF tracers under Linux. Recently I ve been working on updating the RetroArch packages in Debian and as I was doing so I realised there was a need to update the quite outdated retroarch-assets package, which contains various icons and images used for the user interface. I wanted to try and re-generate as many of the artefacts as I could, to ensure the proper source was available. However it wasn t always clear which files were actually needed and which were either source or legacy. So I wanted to trace file opens by retroarch and see when it was failing to find files. Traditionally this is something I d have used strace for, but it seemed like a great opportunity to try out bpftrace. It turns out bpftrace ships with an example, opensnoop.bt which provided details of hooking the open syscall entry + exit and providing details of all files opened on the system. I only wanted to track opens by the retroarch binary that failed, so I made a couple of modifications:
retro-failed-open-snoop.bt
#!/usr/bin/env bpftrace
/*
 * retro-failed-open-snoop - snoop failed opens by RetroArch
 *
 * Based on:
 * opensnoop	Trace open() syscalls.
 *		For Linux, uses bpftrace and eBPF.
 *
 * Copyright 2018 Netflix, Inc.
 * Licensed under the Apache License, Version 2.0 (the "License")
 *
 * 08-Sep-2018	Brendan Gregg	Created this.
 */
BEGIN
 
	printf("Tracing open syscalls... Hit Ctrl-C to end.\n");
	printf("%-6s %-16s %3s %s\n", "PID", "COMM", "ERR", "PATH");
 
tracepoint:syscalls:sys_enter_open,
tracepoint:syscalls:sys_enter_openat
 
	@filename[tid] = args->filename;
 
tracepoint:syscalls:sys_exit_open,
tracepoint:syscalls:sys_exit_openat
/@filename[tid]/
 
	$ret = args->ret;
	$errno = $ret > 0 ? 0 : - $ret;
	if (($ret <= 0) && (strncmp("retroarch", comm, 9) == 0) )  
		printf("%-6d %-16s %3d %s\n", pid, comm, $errno,
		    str(@filename[tid]));
	 
	delete(@filename[tid]);
 
END
 
	clear(@filename);
 
I had to install bpftrace (apt install bpftrace) and then I ran bpftrace -o retro.log retro-failed-open-snoop.bt as root and fired up retroarch as a normal user.
bpftrace failed open log for retroarch
Attaching 6 probes...
Tracing open syscalls... Hit Ctrl-C to end.
PID    COMM             ERR PATH
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/glibc-hwcaps/x86-64-v2/lib
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/x86_64/x86_64/libpulse
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/x86_64/libpulsecommon-
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/x86_64/libpulsecommon-
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/tls/libpulsecommon-16.1.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/x86_64/x86_64/libpulsecomm
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/x86_64/libpulsecommon-16.1
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/pulseaudio/x86_64/libpulsecommon-16.1
3394   retroarch          2 /etc/gcrypt/hwf.deny
3394   retroarch          2 /lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/tls/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libgamemode.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/tls/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/libgamemode.so.0
3394   retroarch          2 /lib/glibc-hwcaps/x86-64-v2/libgamemode.so.0
3394   retroarch          2 /lib/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/tls/libgamemode.so.0
3394   retroarch          2 /lib/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/x86_64/libgamemode.so.0
3394   retroarch          2 /lib/libgamemode.so.0
3394   retroarch          2 /usr/lib/glibc-hwcaps/x86-64-v2/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/tls/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/x86_64/libgamemode.so.0
3394   retroarch          2 /usr/lib/libgamemode.so.0
3394   retroarch          2 /lib/x86_64-linux-gnu/libgamemode.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/libgamemode.so
3394   retroarch          2 /lib/libgamemode.so
3394   retroarch          2 /usr/lib/libgamemode.so
3394   retroarch          2 /lib/x86_64-linux-gnu/libdecor-0.so
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/libdecor-0.so
3394   retroarch          2 /lib/libdecor-0.so
3394   retroarch          2 /usr/lib/libdecor-0.so
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /usr/lib/x86_64-linux-gnu/dri/tls/iris_dri.so
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/glibc-hwcaps/x86-64-v2/libedit.so.
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/x86_64/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/tls/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/x86_64/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/x86_64/libedit.so.2
3394   retroarch          2 /lib/x86_64-linux-gnu/../lib/libedit.so.2
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /etc/drirc
3394   retroarch          2 /home/noodles/.drirc
3394   retroarch          2 /home/noodles/.Xdefaults-udon
3394   retroarch          2 /home/noodles/.icons/default/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/default/index.theme
3394   retroarch          2 /usr/share/icons/default/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/default/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/index.theme
3394   retroarch          2 /usr/share/icons/Adwaita/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/Adwaita/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/index.theme
3394   retroarch          2 /usr/share/icons/hicolor/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/cursors/0000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/index.theme
3394   retroarch          2 /home/noodles/.XCompose
3394   retroarch          2 /home/noodles/.icons/default/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/default/index.theme
3394   retroarch          2 /usr/share/icons/default/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/default/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/Adwaita/index.theme
3394   retroarch          2 /usr/share/icons/Adwaita/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/Adwaita/cursors/0000000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/cursors/00000000000000000000000000
3394   retroarch          2 /home/noodles/.icons/hicolor/index.theme
3394   retroarch          2 /usr/share/icons/hicolor/cursors/000000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/cursors/0000000000000000000000000000
3394   retroarch          2 /usr/share/pixmaps/hicolor/index.theme
3394   retroarch          2 /usr/share/libretro/assets/xmb/monochrome/png/disc.png
3394   retroarch          2 /usr/share/libretro/assets/xmb/monochrome/sounds
3394   retroarch          2 /usr/share/libretro/assets/sounds
3394   retroarch          2 /sys/class/power_supply/ACAD
3394   retroarch          2 /sys/class/power_supply/ACAD
3394   retroarch          2 /usr/share/libretro/assets/xmb/monochrome/png/disc.png
3394   retroarch          2 /usr/share/libretro/assets/ozone/sounds
3394   retroarch          2 /usr/share/libretro/assets/sounds
This was incredibly useful - the only theme image I was missing is disc.png from XMB Monochrome (which fails to have SVG source). I also discovered the runtime optional loading of GameMode. This is available in Debian so it was a simple matter to add libgamemode0 to the binary package Recommends. So, a very basic example of using bpftrace, but a remarkably useful intro to it from my point of view!

3 January 2023

Paul Wise: FLOSS Activities December 2022

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian BTS: unarchive/reopen/triage bugs for reintroduced packages: gnome-shell-extension-no-annoyance
  • Debian servers: contact mail server blocking a Debian MX
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The azure-functions-devops-build work was sponsored. All other work was done on a volunteer basis.

Next.

Previous.